Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond|arXiv(2023)
Jinze Bai Shuai Bai Shusheng Yang Shijie Wang Sinan Tan
Peng Wang Junyang Lin Chang Zhou Jingren Zhou
DOI: https://doi.org/10.48550/arXiv.2308.12966
大規模言語モデル(Large Langage Model; LLM)
視覚-言語モデル(vision-language models; VLMs)
Qwen